The Author-Topic Model and the author prediction

نویسنده

  • Jiasi Song
چکیده

The author-topic model is a generative model for documents that extends Latent Dirichilet Allocation to include authorship information, which is proposed by Michal Rosen-Zvi et al. The model connects each author to a multinomial distribution over topics and associated each topic with a words’ multinomial distribution. A document with multiple authors is modeled as a distribution over topics that are a mixture of the distributions associated with the authors. In this project, I re-implement the model to a collection of about 250 NIPS conference papers (be chosen randomly from a collection of about 1700 NIPS papers). Exact inference is intractable for these datasets and I use Gibbs sampling to estimate the topic and author distributions. The tagging results with different topic numbers are given. After getting the distribution values, I present a new method that apply maximum likelihood estimate to do author prediction on about other 100 papers of which the authors are in the same set as the training papers. The precision of prediction is given. Key word: author-topic model; Gibbs sampling; multinomial distribution; tagging; author prediction;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Joint Author Sentiment Topic Model

Traditional works in sentiment analysis and aspect rating prediction do not take author preferences and writing style into account during rating prediction of reviews. In this work, we introduce Joint Author Sentiment Topic Model (JAST), a generative process of writing a review by an author. Authors have different topic preferences, ‘emotional’ attachment to topics, writing style based on the d...

متن کامل

The Crisis of Representation in Azadeh Khanoom and Her Author by Reza Baraheni

The crisis of representation is a topic widely discussed in critique and theory of postmodern literature. This refers to the crises of the present era including the crisis of meaning, the perplexity of contemporary humankind amidst a mass of valid and invalid data, alienation, etc. Literature, as the epitome of human life, is a reflection of these crises in the contemporary era. Azadeh Khanoom ...

متن کامل

Application of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data

This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values.  Seismic surveying was performed next on these models. F...

متن کامل

A Heuristic Model for Predicting Bankruptcy

Bankruptcy prediction is one of the major business classification problems. The main purpose of this study is to investigate Kohonen self-organizing feature map in term of performance accuracy in the area of bankruptcy prediction.  A sample of 108 firms listed in Tehran Stock Exchange is used for the study. Our results confirm that Kohonen network is a robust model for predicting bankruptcy in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009